DeepSeek R1 Distill Llama 70B

About the Provider

DeepSeek is a Chinese artificial intelligence company based in Hangzhou, Zhejiang that focuses on research and development of large language models and advanced AI technologies. The firm emphasizes open innovation in AI, publishing models and research under permissive licenses to make powerful language models widely accessible and support collaborative development in the global AI community.

Model Quickstart

This section helps you quickly get started with the deepseek-ai/deepseek-r1-distill-llama-70b model on the Qubrid AI inferencing platform. To use this model, you need:

A valid Qubrid API key
Access to the Qubrid inference API
Basic knowledge of making API requests in your preferred language

Once authenticated with your API key, you can send inference requests to the deepseek-ai/deepseek-r1-distill-llama-70b model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="deepseek-ai/deepseek-r1-distill-llama-70b",
  messages=[
    {
      "role": "user",
      "content": "Explain quantum computing in simple terms"
    }
  ],
  max_tokens=10000,
  temperature=0.3,
  top_p=1,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

Model Overview

DeepSeek R1 Distill Llama 70B is a distilled large language model optimized for efficient, high-level reasoning and conversational intelligence. It is trained by distilling high-quality reasoning outputs from DeepSeek-R1 into a 70B LLaMA-based architecture, delivering near frontier-level analytical performance while running on significantly smaller hardware compared to full-scale models.

Model at a Glance

Feature	Details
Model ID	deepseek-ai/deepseek-r1-distill-llama-70b
Architecture	LLaMA-3.1-70B (Distilled)
Model Size	70B parameters
Parameters	6
Training Data	Distilled from DeepSeek R1 high-quality reasoning outputs with LLaMA 70B
Context Length	64K tokens

When to use?

Use DeepSeek R1 Distill Llama 70B if you need:

Strong reasoning and chain-of-thought capabilities for complex tasks
Long-context support up to 64K tokens
Efficient deployment compared to full, non-distilled frontier models
Open-source licensing suitable for on-premise or custom deployments
Reliable performance across math, logic, coding, and research workflows

Inference Parameters

Parameter Name	Type	Default	Description
Streaming	boolean	true	Enable streaming responses for real-time output.
Temperature	number	0.3	Controls creativity and randomness; higher values produce more diverse output.
Max Tokens	number	10000	Defines the maximum number of tokens the model is allowed to generate.
Top P	number	1	Nucleus sampling that limits token selection to a subset of top probability mass.
Reasoning Effort	select	medium	Adjusts the depth of reasoning and problem-solving effort; higher values increase response quality at the cost of latency.
Reasoning Summary	select	auto	Controls verbosity of reasoning explanations: auto, concise, or detailed

Key Features

High-Quality Reasoning: Optimized for strong reasoning and chain-of-thought capabilities, suitable for complex tasks.
Long-Context Support: Can handle up to 64K tokens, enabling processing of very large inputs.
Efficient Deployment: Distilled model runs efficiently compared to full 70B models, reducing hardware requirements.
Configurable Inference: Supports adjustable parameters like temperature, streaming, reasoning effort, and verbosity for flexible and precise outputs.

Summary

DeepSeek R1 Distill Llama 70B brings high-quality reasoning capabilities into a more accessible 70B-parameter distilled model. It supports long-context reasoning, configurable inference settings, and open deployment under an MIT license. The model is well suited for advanced reasoning, technical assistance, and research use cases where efficiency and accessibility are priorities.

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

​About the Provider

​Model Quickstart

​Model Overview

​Model at a Glance

​When to use?

​Inference Parameters

​Key Features

​Summary

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary